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BOXPCT 

IN THE UNITED STATES DESIGNATED/ELECTED OFFICE 
OF THE UNITED STATES PATENT AND TRADEMARK OFFICE 
UNDER THE PATENT COOPERATION TREATY-CHAPTER II 

APPLICANT(S): MARTIN HOLZAPFEL 

ATTORNEY DOCKET NO.: P00,1796 

INTERNATIONAL APPLICATION NO: PCT/DE99/01308 

INTERNATIONAL FILING DATE: 03 MAY 1 999 

INVENTION: METHOD AND ARRANGEMENT FOR DETERMINING 
SPECTRAL SPEECH CHARACTERISTICS IN A SPOKEN 
EXPRESSION 

Assistant Commissioner for Patents, 
Washington, D.C. 20231 

AMENDMENT "A" PRIOR TO ACTION 

Sir: 

Applicants herewith amend the above-referenced PCT application, and 
request entry of the Amendment prior to examination on the United States 
Examination Phase. 

TN THE SPECIFICATION : 
On substitute page 1: 

replace lines 1-2, with 

-SPECIFICATION 
TITLE 

METHOD AND ARRANGEMENT FOR DETERMINING SPECTRAL 
SPEECH CHARACTERISTICS IN A SPOKEN EXPRESSION 
BACKGROUND OF THE INVENTION 
Field of the Invention-; 

above line 5, insert 
-Description of the Related Art-; 

in line 6, cancel "thereby"; 

in line 7, , after "ear", insert -from these sounds-; 
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in line 8, replace ". In particular, the sounds are thereby" with -, 
particularly where these sounds are-; 

in line 10, replace "[1]" with -I. Daubechies, "Ten Lectures on 
Wavelets", Saim Verlag, 1992, ISBN 0-89871-274-2, Ch. 5.1, pp. 129-137-; 

in line 12, replace ". A" with -, resulting in a-; 

in line 13, cancel ", respectively,", and cancel "thereby"; 

in line 14, cancel "ensues", and cancel "English art term:"; 

in line 16, replace "US-A-5528725" with -U.S. Patent No. 5,528, 725-; 

in line 18, before "EP", insert -European Patent-; 

above line 21, insert 

-SUMMARY OF THE INVENTION--; 

replace lines 24-25 with 

This object is achieved by a method for determining spectral speech 
characteristics in a spoken expression, comprising the steps of: a) digitizing the 
expression; b) wavelet transforming the digitized expression; and c) defining 
speaker-specific characteristics based on different transformation stages of the 
wavelet transformation. -; 

in line 26, replace "A method" with -The invention provides a method-; 

and 

in line 27, cancel "is recited in the scope of the invention". 



On page 2: 

in line 1, cancel "thereby"; 
in line 2, after the last "filter", insert -,-; 
in line 3, cancel ", respectively,"; 
25 in line 5, cancel ", respectively,"; 

in line 7, replace "whereby" with -where-; 

in line 8, after "i.e.", insert -,-; 

in line 13, cancel "thereby"; 

in line 14, cancel "comprised therein"; 
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in line 15, after "i.e.", insert-,-; 
in line 18, cancel "comprised therein"; 
in line 22, cancel "comprised therein"; 
in line 23, replace "be defined in" with -is defined such-; and 
5 in line 25, replace "passes" with —pass parts- 

On page 3: 

in line 3, after the first "as", insert -a-; 
in line 5, after "without", insert -a-; 
in line 6, after "as", insert -a-; 
10 in line 13, replace "as" with -is-; 

in line 23, replace "aO" with -a)-; and 
in line 27, cancel "thereby-. 

On page 4: 

in line 8, after "example", insert -,-; 
15 in line 9, cancel "comprised therein" ; 

in line 11, replace "Furthermore" with -Further-; 

in line 12, replace "representation" with -representations-, and before the 
last "the", insert -so-; 

in line 14, before "can", insert -so they-; 
20 in line 16, replace "-- loss-free" with -without loss-; 

in line 1 7, replace "Further" with -Furthermore-; 
in line 18, replace "recited" with -provided-; 
in line 19, replace "digitalized" with -digitized—; 
replace line 24 with 

25 — Advantageous embodiments include adding a step to the inventive 

method of implementing a windowed transformation of the digitalized expression 
into a frequency domain before the wavelet transformation, which may be 
implemented with a fast Fourier transformation. An advantageous embodiment 
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may also include a step of determining a low-pass part and a high-pass part of a 
signal to be transformed in each stage of the wavelet transformation. The high 
pass part can be subdivided into a real part and an imaginary part. 

In the inventive method, the wavelet transformation may include a 
5 plurality of transformation stages, a last transformation stage of the plurality of 
transformation stages supplying a constant part of the expression in a repeated 
low-pass filtering corresponding to the plurality of transformation stages. 
Speaker-specific characteristics may be determined by: a basic frequency of the 
spoken expression; spectral envelope; and/or a huskiness of the spoken 

1 0 expression, and individual speaker- specific characteristics may be adapted to 
provide a natural sounding concatenation of speech sounds. 

An inventive method may be provided implementing the above method 
for determining spectral speech characteristics comprising a step of selecting those 
speech sounds from a predetermined data set that assure a natural sounding 

1 5 concatenation of speech sounds on a basis of individual the spectral speech 
characteristics. 

Finally, the object of the invention may be achieved with an arrangement 
for determining spectral speech characteristics in a spoken expression, comprising 
a processor unit that is configured to digitize the expression, wavelet transform 
20 the digitized expression, and define speaker-specific characteristics on a basis of 
different transformation stages of the wavelet transformation. ~ 

above line 25, insert 

-BRIEF DESCRIPTION OF THE DRAWINGS --; 
cancel line 27; 

25 in line 28, before "a wavelet", insert -is a graph illustrating-; and 

in line 29, before "a wavelet", insert -is a graph illustrating-. 



On page 5: 

in line 1, before "a cascaded", insert -is a block diagram illustrating—; 
in line 3, replace "Figure 4" with -Figures 4A-4F are graphs illustrating 
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frequency spectrums of-; 

in line 4, before "steps", insert -are pictorial diagrams illustrating the- 
above line 5, insert 

--DESCRIPTION OF THE PREFERRED EMBODIMENTS--; 

replace line 6 with -where-; 

in line 13, before "imaginary", insert -the-; 

in line 17, replace "whereby" with -where-; 

in line 19, after "high-pass", insert -part/filter-, after "low-pass", insert 
-part/filter-, and replace ".", with -, producing-; and 

in line 20, replace "In" with -in-, and replace "thereby occurs, i.e."with 



On page 6: 

in line 2, replace "304" with -3 02-; 
in line 3, after "pass", insert -part-; 
1 5 in line 4, replace "Im 1" with -Iml-; and 

in line 6, after all instances of "pass", insert -part-. 



On page 7: 

in line 2, replace "Mi" with -Min-; 

in line 3, replace "Said" with -These-; 
20 in lines 4-5, replace "thereby of particular significance" with 

-particularly significant-; 

in line 8, cancel "thereby"; 

in line 9, cancel "comprised"; 

in line 10, replace "whereby" with -where-; 
25 in line 12, after "With", insert -a-; 

in line 16, after "shows" insert -the-; 

in lines 17-18, replace ", whereby" with -in which-; 

in line 19, replace "representatives" with -representations-; 



in line 22, cancel "all the"; and 

in line 25, replace "thereby to be" with -also-, and replace "dependent" 
with -depending-. 

On page 8: 

in line 2, replace "ensue be adaptation" with -be implemented by 
adapting-; 

in line 4, replace ", whereby" with -in which-, and replace "represent" 
with -are-; 

in line 8, cancel "respectively", and replace "whereby" with -where-; 
in line 9, replace "whereby" with -where-; 
in line 16, replace "is comprised in" with -has-; 
in line 17, replace ". However, said" with -which are-; 
in line 18, replace "thereby occur. When, in" with -. In-, and replace 
"[...]" with -so-; 

in line 19, replace "whereby" with -where-; 
in line 2 1 , before "gradual", insert -a-; 
replace lines 25-27 with 

The above-described apparatus and method are illustrative of the 
principles of the present invention. Numerous modifications and adaptions 
thereof will be readily apparent to those skilled in this art without departing from 
the spirit and scope of the present invention.—. 

IN THE CLAIMS . 

On page 9: 

replace line 1 with - WHAT IS CLAIMED IS:-; 
Please amend claims 1-10 as follows: 

1. (Amended) A method [Method] for determining spectral speech 
characteristics in a spoken expression, comprising the steps of: 
a) digitizing said [whereby the] expression [is digitalized]; 
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b) 



wavelet transforming said digitized [whereby the digitalized] expression 
[is subjected to a wavelet transformation]; and 
defining [whereby the] speaker-specific characteristics based on [are 
defined on the basis of] different transformation stages of said [the] 



5 



wavelet transformation. 



2. (Amended) The method [Method] according to claim 1, further 
comprising the step of implementing [whereby] a windowed transformation of 
said [the] digitalized expression into a frequency domain [is implemented] before 
said [the] wavelet transformation. 



step of implementing said windowed transformation is implemented [whereby the 
transformation into the frequency domain is implemented] with a fast Fourier 
transformation. 

4. (Amended) The method [Method] according to claim 1. further 
1 5 comprising the step of: [one of the preceding claims, whereby] 

determining a low-pass part and a high-pass part of a signal to be 
transformed [are determined] in each stage of said [the] wavelet transformation. 

5. (Amended) The method [Method] according to claim 1, further 
comprising the step of: [one of the preceding claims, whereby] 

20 subdividing a high-pass part into [is subdivided according to] a real part 

and an imaginary part. 



10 



3. (Amended) The method [Method] according to claim 2, wherein said 



6. (Amended) The method [Method] according to claim 1. wherein said 
step of wavelet transformation further comprises [one of the preceding claims, 
whereby the wavelet transformation comprises] a plurality of transformation 
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stages, a [whereby the] last transformation stage of said plurality of 
transformation stages supplying [supplies] a constant part of said [the] expression 
in a repeated low-pass filtering corresponding to said [the] plurality of 
transformation stages. 

5 7. (Amended) The method [Method] according to claim 1 [one of the 

preceding claims], wherein said [whereby the] speaker-specific characteristics are 
determined by an attribute selected from the group consisting of 

a) a basic frequency of the spoken expression; 

b) spectral envelope; and 

10 c) a huskiness of the spoken expression. 

8. (Amended) The [Employment of the] method according to claim 1, 
further comprising the step of [one of the claims 1 through 7 for speech synthesis, 
whereby] adapting individual speaker-specific characteristics [are adapted in view 
of] to provide a natural sounding concatenation of speech sounds. 

15 9. (Amended) A [Employment of the] method for implementing the 

method according to claim 1, comprising the step of: [according to one of the 
claims 1 through 7 for speech synthesis, whereby] 

selecting those speech sounds from a predetermined data set that assure a 
natural sounding concatenation of speech sounds [are selected] on a [the] basis of 

20 individual said spectral speech characteristics. 

10. (Amended) An arrangement [Arrangement] for determining spectral 
speech characteristics in a spoken expression, comprising: 

a processor unit that is configured to digitize said expression, wavelet 
transform said digitized expression, and define speaker-specific characteristics on 
25 a. [such that the following steps can be implemented: 
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a) the expression is digitalized; 

b) the digitalized expression is subjected to a wavelet transformation; 

c) the speaker-specific characteristics are defined on the] 
basis of different transformation stages of the wavelet transformation. 



5 IN THE ABSTRACT: 
On page 11: 

cancel lines 2-3; and 

in line 5, replace "whereby" with -where-. 



REMARKS 

1 0 The present Amendment revises the specification and claims to conform 

to United States patent practice, before examination of the present PCT 
application in the United States National Examination Phase. All of the changes 
are editorial and applicant believes no new matter is added thereby. The 
amendment of claims 1-10 is not intended to be a surrender of any of the subject 

1 5 matter of those claims. 

Early examination on the merits is respectfully requested. 



Submitted by, 



3ey^ (-Reg. No. 45.877) 



Mark Bergner 
Schiff Hardin & Waite 
Patent Department 
6600 Sears Tower 
233 South Wacker Drive 
Chicago, Illinois 60606-6473 
(312) 258-5779 
Attorneys for Applicant 
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REQUEST FOR APPROVAL OF DRAWING ADDITIONS 



Enclosed are 3 sheets of drawings, Figures 3-5, showing in red, the 
addition of labels to the elements depicted therein. Approval of the additions is 
respectfully requested. 

Submitted by, 

/ £ 

><^<w^ (Reg. No. 45.877) 
Mark Bergner ^ 
SCHIFF HARDIN & WAITE 
PATENT DEPARTMENT 
6600 Sears Tower 
Chicago, Illinois 60606-6473 
(312) 258-5779 
Attorney for Applicant(s) 
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METHOD AND ARRANGEMENT FOR DETERMINING SPECTRAL 
SPEECH CHARACTERISTICS IN A SPOKEN EXPRESSION 



The invention is directed to a method and to an arrangement for 
determining spectral speech characteristics in a spoken expression. 
5 In a concatenative speech synthesis, individual sounds are combined from 

speech data banks. In order to thereby obtain a speech curve that sounds natural to the 
human ear, discontinuities must be avoided at the points were the sounds are 
combined (concatenation points). In particular, the sounds are thereby phonemes of a 
language or a combination of a plurality of phonemes. 

10 [1 ] discloses a wavelet transformation. In wavelet transformation, a 

wavelet filter assures that a respective high-pass part and low-pass part of a following 
transformation stage completely restore a signal of a current transformation stage. A 
reduction of the resolution of the high-pass part or, respectively, low-pass part thereby 
ensues from one transformation stage to the next (English art term: "sub-sampling"). 

15 In particular, the plurality of transformation stages is finite due to the sub-sampling. 

US- A-5 528725 discloses a method for speech recognition with wavelet 
transformations. 

EP-A-05 19802 discloses a method for speech synthesis that adapts 
speaker-specific characteristics in view of a natural sounding concatenation of speech 
2 0 sounds. 

The object of the invention is comprised in specifying a method and an 
arrangement for determining spectral speech characteristics with whose assistance, in 
particular, a speech output that sounds natural can be determined. 

This object is achieved according to the features of the independent 

2 5 claims. 

A method for determining spectral speech characteristics in a spoken 
expression is recited in the scope of the invention. To that end, the spoken expression 
is digitalized and subjected to a wavelet transformation. The speaker-specific 
characteristics are determined on the basis of different transformation stages of the 

3 0 wavelet transformation. 



2 

One advantage, in particular, is thereby that the expression is divided in 
the wavelet transformation with a high-pass filter and a low-pass filter and different 
high-pass parts or, respectively, low-pass parts of different transformation stages 
contain speaker-specific characteristics. 
5 The individual high-pass parts or, respectively, low-pass parts of different 

transformation stages stand for predetermined speaker-specific characteristics, 
whereby both high-pass part as well as low-pass part of a respective transformation 
stage, i.e. the respective characteristic, can be modified separately from other 
characteristics. When, in inverse wavelet transformation, the original signal is in turn 

1 0 combined from the respective high-pass and low-pass parts of the individual 

transformation stages, then it is assured that it is exactly the desired characteristic that 
has been modified. It is thus possible to modify certain predetermined peculiarities of 
the expression without the rest of the expression being thereby influenced. 

One development is comprised therein that the expression is windowed 

1 5 before the wavelet transformation, i.e. a predetermined set of samples are cut out, and 
is transformed into the frequency domain. In particular, a fast-Fourier-Transformation 
(FFT) is employed for this purpose. 

A further development is comprised therein that a high-pass part of a 
transformation stage is split into a real part and an imaginary part. The high-pass part 

2 0 of the wavelet transformation corresponds to the difference signal between the current 
low-pass part and the low-pass part of the preceding transformation stage. 

In particular, one development is comprised therein that the number of 
transformation stages of the wavelet transformation to be implemented be defined in 
that a constant part of the expression is contained in the last transformation stage, 

2 5 which is composed of series-connected low-passes. The signal as a whole can then be 
presented by its wavelet coefficients. This corresponds to the complete 
transformation of the information of the signal excerpt into the wavelet space. 



When, in particular, only the respective low-pass part is further- 
transformed (with a high-pass and a low-pass filter), then the difference signal 
remains as high-pass part of a transformation stage, as explained above. When 
difference signals (high-pass parts) are accumulated over the transformation stages, 
then the information of the spoken expression without constant part is obtained in the 
last transformation stage as cumulative high-pass part. 

In the scope of an additional development, the speaker-specific 
characteristics can be identified as: 

a) Basic Frequency: 

The oscillation of the high-pass part of the first or of the second 
transformation stage of the wavelet transformation allows the basic 
frequency of the expression to be recognized. The basic frequency 
indicates whether the speaker as a man or a woman. 

b) Shape of the Spectral Envelope: 

The spectral envelope contains information about a transfer 
function of the vocal tract in the articulation. The spectral 
envelope is dominated by the formants in a voiced region. The 
high-pass part of a higher transformation stage of the wavelet 
transformation contains this spectral envelope. 

c) Spectral Tilt (Huskiness): 

The huskiness in a voice is visible as negative slope in the curve of 

the penultimate low-pass part. 
The speaker-specific characteristics aO through c) are of great significance 
in the speech synthesis. As initially mentioned, large sets of actually spoken 
expressions from which exemplary sounds are excerpted and later combined to form a 
new word are used in concatenative speech synthesis (synthetic speech). 
Discontinuities between combined sounds are thereby disadvantageous since the 
human ears perceives these as being unnatural. In order to oppose discontinuities, it is 



advantageous to directly acquire the perceptively relevant quantities and, potentially, 
to compare and/or adapt them to one another. 

This can occur by direct manipulation in that a speech sound is adapted at 
least in terms of its speaker-specific characteristics, so that it is not perceived as being 
5 disturbing in the acoustic context of the concatenatively linked sounds. It is also 
possible to direct the selection of a suitable sound such that speaker-specific 
characteristics of sounds to be linked match one another as well as possible, for 
example that the same or similar huskiness is inherent in the sounds. 

One advantage of the invention is comprised therein that the spectral 
1 0 envelope reflects the articulation tract of the speaker and is not supported on formants 
like, for example, a pole-point model. Further, no data are lost as non-parametric 
representation in the wavelet transformation, the expression can always be completely 
reconstructed. The data proceeding from the individual transformation stages of the 
wavelet transformation are linearly independent of one another, can thus be influenced 
1 5 separately from one another and be recombined later to form the influenced 
expression — loss-free. 

Further, an arrangement for determining spectral speech characteristics is 
recited that comprises a processor unit that is configured such that an expression can 
be digitalized. Subsequently, the expression is subjected to a wavelet transformation 
2 0 and speaker-specific characteristics are determined on the basis of different 
transformation stages. 

This arrangement is particularly suited for the implementation of the nc 
method or one of its developments explained above. 

Developments of the invention also derive from the dependent claims. 
2 5 Exemplary embodiments of the invention are presented and explained 

below on the basis of the drawing. 

Shown are: 
Figure 1 a wavelet function; 

Figure 2 a wavelet function subdivided according to real part and imaginary part; 
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Figure 3 a cascaded filter structure that represents the transformation steps of the 

wavelet transformation; 
Figure 4 low-pass parts and high-pass parts of different transformation stages; 
Figure 5 steps of the concatenative speech synthesis. 
5 Figure 1 shows a wavelet function that is defined by 

v(f ) = c ■ [i - (|) 2 ) ■ .~5'k) (i), 

whereby 

f references the frequency, 

a references a standard deviation, and 

c references a predetermined norming constant. 
10 In particular, the standard deviation o is defined by the prescribable 

location of the sideband minimum 101 in Figure 1. 

Figure 2 shows a wavelet function with a real part according to Equation 
(1) and a Hilbert transform H of the real part as imaginary part. The complex wavelet 
function thus derives as 

<F(f) = V (f) + j • H{v|/(f)} (2) . 

1 5 The constant c from Equation (1) is employed in order to norm the complex wavelet 
function: 

oo 

J *F(f) ¥(f) df = 1 (3) , 

— oo 

whereby *3K references the conjugated-complex wavelet function. 

Figure 3 shows the cascaded application of the wavelet transformation. A 
signal 301 is filtered both by a high-pass HP1 302 as well as by a low-pass TP1 305. 
2 0 In particular, a sub-sampling thereby occurs, i.e. the plurality of values to be stored is 
reduced per filter. An inverse wavelet transformation assures that the original signal 
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301 can in turn be reconstructed from the low-pass part TP1 305 and the high-pass 
part HP 1 304. 

Filtering in the high-pass HP1 302 is separated according to real part Rel 
303 and imaginary part Im 1 304. 
5 Following the low-pass filter TP 1 305, the signal 310 is filtered anew both 

by a high-pass HP2 306 as well as by a low-pass TP2 309. The high-pass HP2 306 
again comprises a real part Re2 307 and an imaginary part Im2 308. Following the 
send transformation stage 311, the signal is filtered again, etc. 

When a (FFT-transformed) short-time spectrum with 256 values is 
1 0 assumed, then eight transformation steps are implemented (sub-sampling rate: 1/2) 
until the signal from the last low-pass filter TP8 corresponds to the constant part. 

Figure 4 shows various transformation stages of the wavelet 
transformation, divided according to low-pass parts (Figures 4A, 4C and 4E) and 
high-pass parts (Figures 4B, 4D and 4F). 
1 5 The basic frequency of the spoken expression can be seen from the high- 

pass part according to Figure 4B. In addition to the fluctuations in the amplitude, a 
dominating periodicity in the wavelet-filtered spectrum, the basic frequency of the 
speaker, can be clearly recognized. On the basis of the basic frequency, it is possible 
to adapt predetermined expressions to one another in the speech synthesis or to define 
2 0 suitable expressions from a data bank with predetermined expressions. 

The formants of the voice signal excerpt (the length of the voice signal 
excerpt corresponds to about double the basic frequency) are shown as pronounced 
minimums and maximums in the low-pass part of Figure 4C. The formants represent 
resonant frequencies in the vocal tract of the speaker. The clear presentability of the 
2 5 formants enables an adaptation and/or a selection of suitable sound components in the 
concatenative speech synthesis. 

The huskiness of a voice can be determined in the low-pass part of the 
penultimate transformation stage (given 256 frequency values in the original signal: 



TP7). The descent of the course of the curve between maximum Mx and minimum 
Mi characterizes the degree of the huskiness. 

Said three speaker-specific characteristics are thus identified and can be 
intentionally influenced for the speech synthesis. It is thereby of particular 
5 significance that, in inverse wavelet transformation, the manipulation of a single 
speaker- specific characteristic influences only this; the other perceptibly relevant 
quantities remain unaffected. The basic frequency can thus be designationally 
adjusted without the huskiness of the voice being thereby influenced. 

Another possible utilization is comprised in the selection of a suitable 

1 0 sound segment for concatenative linking with another sound segment, whereby the 
two sound segments were additionally recorded by different speakers in different 
contexts. With determination of spectral speech characteristics, a suitable sound 
segment to be linked can be found since, with the characteristics, criteria are known 
that automatically enable a comparison of sound segments to one another according to 

15 specific rules and, thus, a selection of the suitable sound segment. 

Figure 5 shows steps of a concatenative speech synthesis. A data bank is 
produced with a predetermined set of naturally spoken language of different speakers, 
whereby sound segments in the naturally spoken language are identified and stored. 
Numerous representatives of the various sound segments of a language derive that can 

2 0 be accessed by the data bank. The sound segments are, in particular, phonemes of a 

language or a concatenation of such phonemes. The possibilities in the compilation of 
new words are all the greater the smaller the sound segment is. Thus, the German 
language comprises a predetermined set of approximately 40 phonemes that suffice 
for the synthesis of nearly all words of the language. Different acoustic contexts are 

2 5 thereby to be taken into consideration dependent on the word in which the phoneme 
occurs. It is then important to embed the individual phonemes into the acoustic 
context such that discontinuities, which human hearing senses as unnatural and 
"synthetic", are avoided. As mentioned, the sound segments stem from different 
speakers and thus exhibit different speaker-specific characteristics. In order to 
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synthesize an expression that has as natural an effect as possible, it is important to 
minimize the discontinuities. This can ensue be adaptation of the identifiable and 
modifiable speaker-specific characteristics or by selecting suitable sound segments 
from the data bank, whereby the speaker-specific characteristics likewise represent a 
5 critical aid in the selection. 

By way of example, Figure 5 shows two sounds A 507 and B 508 that 
respectively exhibit individual sound segments 505 or, respectively, 506. The sounds 
A 507 and B 508 respectively derive from a spoken expression, whereby the sound A 
507 is clearly distinct from the sound B 508. A parting line 509 indicates whereby the 

1 0 sound A 507 is to be linked to the sound B 508. In the present case, the first three 

sound segments of the sound A 507 are to be concatenatively linked with the last three 
sound segments of the sound B 508. 

A temporal stretching or compression (see arrow 503) of the sound 
segments is implemented along the parting line 509 in order to avoid the 

1 5 discontinuous impression at the transition 509. 

One version is comprised in an abrupt transition of the sounds parted 
along the parting line 509. However, said discontinuities that human hearing 
perceives as disturbing thereby occur. When, in contrast, a sound C is compiled [...] 
that the sound segments within a transition region 501 or 502 are considered, whereby 

2 0 a spectral distance criterion is adapted between two sound segments that can be 
allocated to one another in the respective transition region 501 or 502 (gradual 
transition between the sound segments). The Euclidean distance between the 
coefficients that are relevant in this region is utilized as the distance criterion, 
especially in the wavelet space. 
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Patent Claims 



1. 



Method for determining spectral speech characteristics in a spoken 



expression, 



5 



a) 



b) 



c) 



whereby the expression is digitalized; 

whereby the digitalized expression is subjected to a wavelet 

transformation; 

whereby the speaker-specific characteristics are defined on the basis of 
different transformation stages of the wavelet transformation. 



2. Method according to claim 1 , whereby a windowed transformation 
10 of the digitalized expression into a frequency domain is implemented before the 
wavelet transformation. 



frequency domain is implemented with fast Fourier transformation. 

4. Method according to one of the preceding claims, whereby a low- 
1 5 pass part and a high-pass part of a signal to be transformed are determined in each 
stage of the wavelet transformation. 



pass part is subdivided according to a real part and an imaginary part. 

6. Method according to one of the preceding claims, whereby the 
2 0 wavelet transformation comprises a plurality of transformation stages, whereby the 
last transformation stage supplies a constant part of the expression in a repeated low- 
pass filtering corresponding to the plurality of transformation stages. 



3. 



Method according to claim 2, whereby the transformation into the 



5. 



Method according to one of the preceding claims, whereby a high- 



7. Method according to one of the preceding claims, whereby the 
speaker-specific characteristics are determined by 

a) a basic frequency of the spoken expression; 

b) spectral envelope; 

c) a huskiness of the spoken expression. 

8. Employment of the method according to one of the claims 1 
through 7 for speech synthesis, whereby individual speaker-specific characteristics are 
adapted in view of a natural sounding concatenation of speech sounds. 

9. Employment of the method according to one of the claims 1 
through 7 for speech synthesis, whereby those speech sounds from a predetermined 
data set that assure a natural sounding concatenation of speech sounds are selected on 
the basis of individual spectral speech characteristics. 

10. Arrangement for determining spectral speech characteristics in a 
spoken expression, comprising a processor unit that is configured such that the 
following steps can be implemented: 



a) the expression is digitalized; 

b) the digitalized expression is subjected to a wavelet transformation; 

c) the speaker-specific characteristics are defined on the basis of different 
transformation stages of the wavelet transformation. 
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Abstract 

METHOD AND ARRANGEMENT FOR DETERMINING SPECTRAL SPEECH 
CHARACTERISTICS IN A SPOKEN EXPRESSION 

Spectral speech characteristics are determined in a naturally spoken 
5 expression, whereby the expression is digitalized and subjected to a wavelet 
transformation. The speaker-specific characteristics proceed from the different 
transformation stages of the wavelet transformation. In the course of a speech 
synthesis, these characteristics can be compared to characteristics of other expressions 
in order to generate a synthetic speech signal that sounds continuous to the human ear. 
1 0 Alternatively, the characteristics can also be designationally modified in order to 
counter a perceptive dissonance. 
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